Hypothesis testing with two population means

Let’s break down the steps

  • State our null and alternative hypothesis

  • Identify the distribution and compute the degrees of freedom

  • Find the critical value using what we know in step 2

  • Compute student’s t test or our ANOVA (we’ll get here this week)

  • Accept or reject our null hypothesis

One-tailed or two-tailed

We use a one-tailed if our hypothesis involves one direction {e.g.,., greater than, less than, bigger, smaller, etc.).


We use a two-tailed (most common) if we are looking into differences between two group without predicting the direction.


Newish Equations Incoming 🧮

Pooled Standard Deviation

A measure of dispersion obtained by pooling multiple sample data sets (in our case two) into one large data set to calculate an grouped standard deviation.


\[\sigma_p = \sqrt{\frac{{(n_1 - 1) \cdot \sigma^2_1 + (n_2 - 1) \cdot \sigma^2_2}}{{n_1 + n_2 - 2}}}\] \(n\) = sample size

\(\sigma\) = standard deviation

T-Statistic for Independent Samples

We use this value to compare with our critical value in order to accept or reject our null.

\[t = \frac{{\bar{X}_1 - \bar{X}_2}}{{\sigma_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}}\] \(\bar{x}\) = Sample mean

\(n\) = Sample size

\(\sigma_p\) = pooled standard deviation

Student’s t Test

A test that is used to compare the means of two groups, where the independent variable is categorical and our dependent variable is continuous.


Helps determine if the differences between means are reliably different or just due to chance.


We use the t-distribution here and we calculate the degrees of freedom by:

\[df = (n_1 + n_2) - 2\]

Example

Suppose we have two groups of students - Group A and Group B. We want to know if there is a significant difference (95% or 0.05) in their exam scores.

Group A

n = 20

\(\bar{x}\) = 75

\(\sigma\) = 10

Group B

n = 20

\(\bar{x}\) = 80

\(\sigma\) = 12

Null Hypothesis \(H_0\): There is no meaningful difference in the mean exam scores between Group A and Group B.

Alternative Hypothesis \(H_A\): There is a meaningful difference in the mean exam scores between Group A and Group B.

Example

We use a t-distribution (p. 373) and we calculate the degrees of freedom to find our critical value.


\((n + n)- 2\)


\((20 + 20) - 2 = 38\)

  • Uh oh! There’s no 38 on the table! When this occurs we just use the value closest to our df. So here, we use 40.
  • 2.021

Example

1) Plug in our knowns


2) Calulate the numerator with our Order of Operations in mind


3) Calulate the denominator


4) Divide


5) Square root it

\[\sigma_p = \sqrt{\frac{{(n_1 - 1) \cdot \sigma^2_1 + (n_2 - 1) \cdot \sigma^2_2}}{{n_1 + n_2 - 2}}}\]

Example

1) Plug in our knowns


2) Calulate the numerator


3) Calulate the denominator (work the values underneath the \(\sqrt{}\) first)


4) Multiple by the \(\sigma\)


5) Divide

\[t = \frac{{\bar{X}_1 - \bar{X}_2}}{{\sigma_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}}\]

Example

Our critical value was \(\pm\) 2.021.


If the t-value is greater than the critical t-value, we reject the null hypothesis and conclude that there is a significant difference between the means of the two groups.


\(|-1.43| < 2.021\)

(you don’t have to use an absolute value here but it helps with interpreting)

\(1.43 < 2.021\)

Therefore, 🥁 drum roll 🥁

Example

We fail to reject the null hypothesis!


There is not enough evidence to conclude that there is a meaningful difference in the mean exam scores between Group A and Group B.

Analysis of Variance (ANOVA)


Used to examine a meaningful relationship between the means of more than two groups.


The independent variable must be categorical (e.g., ) and the dependent variable continuous.

Different Types of Variance in ANOVA

Between-group variance

  • Measures the extent each group is similar or different from other groups within the sample

  • The larger the variance, the more likely the groups are distinct from each other

Within-group variance

  • Measures the spread of data within each group

  • Considered to be the error or residual

  • Focuses on how much the values within deviate from the group’s mean

Many more equations incoming 🧮

The Overall Mean

\[\bar{X}_{\text{overall}} = \frac{\sum_{i=1}^{n} X_i}{n}\]


\({\sum_{i=1}^{n}}\): The sum of a series of terms, where \(n\) is the upper bound (which tells us how many terms to include) and \(i = 1\) is our lower bound which represents our starting point (i.e., our first group), also called our starting index


\(X_i\): each individual data point (i.e., our group means)


\(n\): Total number of observations

Sum of Squares Between (SSB)

\[SSB = \sum_{i=1}^{n} n_i (\bar{X}_i - \bar{X}_{\text{overall}})^2\]

\({\sum_{i=1}^{n}}\): The sum of a series of terms

\(n\): upper bound or the total number of groups

\(i\): lower bound index of summation or our starting point

\(n_i\): This represents the number of data points in each group (e.g., if there are 4 data points used to calc a mean, you’d use 4)

\(x_i\): This is the mean of the the data points each group.

\(\bar{X}_{\text{overall}}\): The overall mean

Sum of Squares Within (SSW)

\[SSW = \sum_{i=1}^{n} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2\]


\({\sum_{i=1}^{n}}\): The sum of a series of terms (i.e., summing across all groups) where \(n\): upper bound or the total number of groups and \(i\): lower bound index of summation or our starting point

\({\sum_{j=1}^{n_i}}\):Within each group \(i\), we are summing across all \(n_i\) observations in that group (where \(j\) is our starting point like \(i\))

\(X_{ij}\): Each data point within each group or \(jth\) observation in the \(ith\) group.

\(\bar{X}_i\): The group’s mean

Mean Square Between Groups

\[MSB = \frac{SSB}{df_B}\]


\(SSB\): The sum of square between value

\(df_B\): The degrees of freedom between groups (i.e., the number of groups - 1)

Mean Square Within Groups

\[MSW = \frac{SSW}{df_W}\]


\(SSB\): The sum of square within value

\(df_B\): The degrees of freedom within groups (i.e., the total number of observations - the total number of groups)

The F Statisiic

\[F = \frac{MSB}{MSW}\]


MSB: Mean Square Between

MSW: Mean Square Within

Example

A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 4 weeks. The outcome of interest is weight loss in pounds.

Low Calorie Low Fat Low Carb
5 3 6
4 4 4
5 2 1

\(H_0\) There is no difference between the diet programs.

\(H_A\) There is a difference between the diet programs.

Step 1

1 Find our within group means

Low calorie = \(4.66\)

Low fat = \(3\)

Low carb = \(3.66\)


2 Find our \(\bar{X}_{\text{overall}}\)

\(\frac{4.66 + 3 + 3.66}{3} =\)

  • \(3.77\)

\[\bar{X}_{\text{overall}} = \frac{\sum_{i=1}^{n} X_i}{n}\]

Step 2 Find SSB

  • Low calorie = \(3(4.66 - 3.77)^2\)

  • Low fat = \(3(3 - 3.77)^2\)

  • Low carb = \(3(3.66 - 3.77)^2\)

\[SSB = \sum_{i=1}^{n} n_i (\bar{X}_i - \bar{X}_{\text{overall}})^2\]

  • \(2.37 + 1.77 + 0.036 =\)

  • \(SSB = 4.176\)

Step 3 Find SSW

Low calorie


  • \((5 - 4.66)^2 + (4 - 4.66)^2 + (5 - 4.66)^2\)

  • \(0.66\)

Low fat

  • \((3 - 3)^2 + (4 - 3)^2 + (2 - 3)^2\)

  • \(2\)

Low carb

  • \((6 - 3.66)^2 + (4 - 3.66 )^2 + (1 - 3.66)^2\)

  • \(12.66\)

\[SSW = \sum_{i=1}^{n} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2\]


  • \(0.66 + 2 + 12.66 = 15.32\)

Step 4

\(df_B\): The degrees of freedom between groups (i.e., the number of groups - 1)

  • \(3 - 1 = 2\)

\(df_W\): The degrees of freedom within groups (i.e., the total number of observations - the total number of groups)

  • \(9 - 3 = 6\)

\[MSB = \frac{SSB}{df_B}\]

  • \(\frac{4.176}{2} = 2.088\)

\[MSW = \frac{SSW}{df_W}\]

  • \(\frac{15.32}{6} = 2.55\)

Step 5


\[F = \frac{MSB}{MSW}\]


\[ F = \frac {2.08}{2.55} = 0.81\]

Step 6

Head to the F Distribution Table on p. 377


\(df_B = 2\)

\(df_W = 6\)


\[Critical \, value = 5.14\]

  • \[0.81 < 5.14\]

  • We fail to reject the null, there is no difference in mean weight loss among the three diets.

Thank you for being a great class!